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1.  Introduction 


Consider  a  p-th  order  autoregressive  model 
P 

Y(t)  -  Z  0,Y(t-j)  +  e(t)  ,  (1) 

J-l  3 

where  t  ■  1,2,..., n,  0^eR,  Y(t)  is  the  observation  at  time  t,  and 

e(l),  e(2) . e(n)  are  n.i.d.  (0,t),  t  >  0.  The  parameters 

0- ,9., . . . ,0  ,  and  t  are  unknown  and  the  'initial'  observations 

11  p 

Y(0),Y(-l),...,y(l-p),  are  assumed  to  be  known  constants. 

Given  a  sample  S  *  [Y(l) , . . . ,Y(n) ]^of  observations,  how 
n 

does  one  forecast  future  observations  W(j)  ■  Y (n+j ) ,  where 
j  ■  l,2,...,k  ,  and  k  is  an  integer  such  that  k  >  1?  Of  course 
there  are  many  non-Bayes i an  techniques  of  forecasting  in  the 
literature  including  Box  and  Jenkins,  exponential  smoothing,  and 
stepwise  autoregression,  all  of  which  are  explained  in  Granger 
and  Newbold  (1977).  In  addition,  Bayesian  techniques  of  forecast 
ing  have  been  developed  by  Harrison  and  Stevens  (1976),  Zellner 
(1971)  and  Chow  (1975),  who  base  their  forecasts  on  the  Bayesian 
predictive  distribution  of  the  future  observations. 

The  Bayesian  predictive  distribution  is  the  conditional 
distribution  of  the  future  observations  ■  [W(l) , . . . ,W(k) ] " 
given  the  past  observations  Sq  and  plays  a  prominent  role  in 
Bayesian  methodology.  Aitchlson  and  Dunsmore  (1975)  develop  the 
Bayesian  predictive  density  for  many  of  the  traditional  statisti¬ 
cal  models,  but  curiously,  it  has  not  often  been  used  in  time 
series  analysis,  except  by  Zellner  and  Chow,  where  the  former  au¬ 
thor  uses  it  with  first  and  second  order  autoregressive  processes 


for  K  ■  1  (one-step  ahead  prediction),  and  Chow  derives  the 

predictive  moments  for  the  general  case. 

The  purpose  of  this  study  is  to  characterize  the  predictive 

distribution  of  W.  given  S  .  It  will  be  shown  that  when  k  *  1. 
k  n 

The  predictive  distribution  of  is  a  univariate  t  and  that  when 
k  *  2,  the  conditional  predictive  distribution  of  W(2)  given  W(l) 
is  a  univariate  t  and  that  the  marginal  predictive  distribution 
of  W(l)  is  also  at.  In  general,  assuming  a  normal-gamma 
conjugate  prior  density  for  the  parameters,  the  predictive  density 
of  is  a  product  of  univariate  t  densities. 

This  study  is  concluded  with  a  numerlal  demonstration  of 
one-step  ahead  forecasting  with  a  first  order  autoregressive 
model.  Using  a  normal-gamma  prior  density  for  the  two  parameters 
of  the  model,  the  mean  and  variance  of  the  predictive  distribution 
of  W(l)  -  Y(n  +  1)  is  computed  for  a  wide  variety  of  models, 
sample  sizes,  and  values  of  the  prior  mean  of  the  autoregressive 
parameter. 

2.  The  Prior  and  Posterior  Analyses 

Using  the  Bayesian  approach,  one  must  specify  a  prior  density 
for  6  ■  (0, ,0„, .. .,0  )" and  t,  and  it  often  is  convenient  to  use 

12  p 

either  a  Jeffreys'  improper  prior 

*  1/t,  t  >  0,  0eRp  (2) 

or  the  normal-gaama  conjugate  prior  density 

$2<9»T)  "  52l<e/T)522(T)’  T  >  0,  0eRp  , 
where  the  conditional  prior  density  of  0  given  T  is 


(3) 


(4) 


521(6/t)  « 


| 0£rP 


and 


C22(t)  *  Ta_1e"Xe  ,  t  >  0  (5) 

la  the  marginal  prior  density  of  x. 

Thus,  apriorl,  0  given  x  has  a  normal  distribution  with  mean 
vector  y  and  precision  matrix  xff,  where  IP  la  a  symmetric  posi¬ 
tive  definite  matrix,  and  T  has  a  gamma  distribution  with  para¬ 
meters  a  >  0  and  0  >  0.  This  lilies  the  marginal  prior  density 
of  6  is 

53(Q)  «  [26  +  (0-y)"r(0-u)]“(1+2a>/2,  OeRp  (6) 

which  is  a  t  density  with  2a  degrees  of  freedom,  location  u,  and 
precision  matrix  (2a)  P(26)_1.  Note,  the  parameters  of  the 
marginal  prior  distribution  of  t  are  also  parameters  of  the 
marginal  prior  density  of  0,  hence  one's  prior  information  about 
0  depends  on  one's  prior  opinion  of  T. 

The  prior  density  (2  of  the  parameters  0  and  t  is  combined 
with  the  conditional  density  of  the  observations  S  given  0  and  T, 

U 

which  is 

f(S  /0,T)  «  Tn/2exp  -  |  Z  [Y(t)-  l  0.Y(t-J)]2,S  eR*  (7) 
n  2  t-1  j-1  J  n 

the  product  is  the  posterior  density  of  0  and  x,  namely, 

S±2±2a-1  _ 

C(0»t/S  )  «  x  2  exp-  -  {20  +  (0-urF(0-y)  + 

2 

s  [Y(t)  -  Z  0 .Y(t-J)]2}  , 
t«l  5-1  J 


(8) 


3.  The  Bayesian  Predictive  Distribution 

The  Bayesian  predictive  density  of  W.  (conditional  on  S  )  is 

a  ™  ® 


:/s„>  -  l  «<e- 


T/Sn)f(Wk/Sn,6,T)d9dT,  WfceR 


where  ft-  {(0,t) :0eRp,T  >  0}  and  f(W./S  ,0,r)  is  the  conditional 

k  n 

density  of  the  k  future  observations  given  S^,  9,  and  T.  It  is 

seen  that  the  Bayesian  predictive  density  of  is  the  average 

(with  respect  to  the  posterior  distribution  of  the  parameters)  of 

the  conditional  predictive  density  of  given  0  and  T. 

The  Integrand  of  (9)  is  proportional  to 

5„(0,T)f(S  /0,x)f  (W./S  ,0,T),  thus  the  predictive  density  of  W 
2  n  k  n  a 

is  the  average  (with  respect  to  the  prior  distribution) of  the 

distribution  of  and  given  0  and  X ,  which  has  density, 

n+k  n  p  .  -  k  P 

f  (sn,Wk/0,T)«nT^-f  t£1K<t)-j2iejY(t-j)]  Z+8£1[W(s)-J£10jW(s-j)] 


where  S  £R  ,  W.6R  ,  and  W(«i)  *  Y(n**i),  1  a  0,1,2, «..,  . 

n  a  ' 

The  joint  distribution  of  sn»wk»®»  and  T  is 
f(Sn*Wk*0,T)  ’  f<sn»V0»T)52(e’T),SneRn*WkeRk*0eRP»T  >  °* 


f(Sn,Wk)  -  /f(Sn,Wk,0,T)d0dr,SneRn,WkeRk  (12) 

(2 

is  the  joint  density  of  the  past  and  future  observations,  and  the 
Bayesian  predictive  distribution  of  Wk  will  be  identified  from 
this  density,  because 

tWjSJ  « f <S_ ,WJ ,  W.e Rk  (13) 


It  can  be  shown  that  if  R  >  1, 


s<VV  -  > 

where 

8l(wjt-l»Sn)  “I A! '1/2,Wk_1eRk'1, 

82(Wk.,Sn)  /2,WkeRk, 

and  W  does  not  depend  on  W(l) ,W(2) ,  etc. 
o 

The  other  quantities  are 


A  -  ax  +  a2  +1, 
B  “  B1  +  B2  +TVt 


C  -  £  Y2(t)  +  £  W2 (a)  +  2B 

t"l  8*1 

where  A1  and  A2  are  the  pxp  matrices 

A,  -  [  £  Y(t-j)Y(t-A)  ]  >  i  <  J  <  l  <  P 
1  t-1 


A.  -  [  £  W(s-j)W(s-A)J,  1  <  j  <  A  <  p  . 

1  o-l 

The  pxl  vectors  and  B2  are  given  by 


(16) 


B.  -  (  £  Y(t)Y(t-j)  1 1  1  <  j<  P  (22) 

1  t-1 

and 

k 

B,  -  [  £  W(s)W(s-j) ,  1  <  j  <  P  .  (23) 

4  s-1 

Consider  equations  (14) ,(15),  and  (16),  then  if  k  ■  1, 
depends  on  W(l) ,  via  A,  but  not  W(2) ,  and  g2  depends  on  both  W(l) 
and  W(2) .  When  k  ■  3,  g2  depends  on  W3  -  (W(l) ,W(2) ,W(3) )  and 
g  only  on  W(l)  and  W(2) .  In  general  if  k  >  2,  g2  depends  on  Wfc 


and  the  conditional  distribution  of  W(2)  given  W(l)  Is 
given  by  (16)  with  1c  •  2,  which  is  a  t  density  with 
o+2a~  degrees  of  freedom,  location 

(26) 


E[W<2)/W(l),Sn]  -  . 


and  precision. 


(n+2ct)D 


•2-E2D2  E2 


(27) 


P[W(2)/W(l),Sn] 
where, 

D2  '  1  *  G-1A‘1|3-1 

E2  -  G_x  A^tB^PU  +  W(l)Gol  , 

F2  -  C  -  W2 (2)  -  [B1+PUW(l)Go]"A"1[B1+PpW(l)Go]  , 
and 

Gx  -  (W(-l),W(-2) . W(2-p)  ]“*  . 

(c)  If  K  ■  3,  the  predictive  distribution  of  is  such  that 
the  marginal  distribution  of  W(l)  is  given  by  (a),  the 
conditional  distribution  of  W(2)  given  W(l)  is  given 
above  by  (b),  and  the  conditional  density  of  W(3)  given 
W(l)  and  W(2)  is  the  t-dcnsity  (16)  with  K  -  3,  with 


n+2a  degrees  of  freedom  location 
E[W(3)A?(1),W(2),S^1  "  »  (28) 

and  precision 

P[W(3)/W(1),W(2),S  ]  ■  (29) 

F3-e3D3  E3 

where 

»3  -  1  -  «:2a‘1o-2  • 

E3  -  G22A"1[B1+B»W(l)Go-W(2)G_1)  , 

F  -  C  -  W*(3)  -  [B1+PU4W(1)G  ■W(2)G_1]"A'1[B1+lPU-W(l)Go«W(2)G-1], 
J  and 

g2  "  tW(-2),...,W(3-p)]". 


(d)  In  general  if  K  >  2,  the  conditional  predictive  distri¬ 
bution  of  W(k)  given  [W(l)  ,W(2) , . . .  ,W(k-l)  ] '  is  t 


with  n+2a  degrees  of  freedom,  location. 


BIBOO/Vi-V  -  *k 

(30) 

and  precision 

(0+20)11 

(31) 

P[W(k)/Wk_rSn]  -  Fk-E]cD-lEk 

where 


Dk  *  1  '  G-(k-UA"1<5-(k-l)  , 


-  k-1  ,  k-l 

Fk  "  c“**  W-EBj+Fu*  Z  w(j)6-(j-d^a  tBi+]Puf  E  *(j)G-(j- 

j  1  j  1  ■ 


k-1 


k-1 


where  for  i  -  0,1,2,.. . 

'  G_i  -  [W(-i) ,W(-i-l) , . .  .W(i+l-p) ] '  . 

Thus  it  is  seen  that  the  predictive  density  of  Wfc  maybe 
expressed  as  the  product  of  k  univariate  t  densities,  namely  the 
marginal  density  of  W(l),  given  by  (a),  the  conditional  predictive 
density  of  W(2)  given  W(l) ,  given  above  by  (b) ,  and  so  on;  however 
the  predictive  density  of  is  not  the  standard  k-variate  multi¬ 
variate  t  density  as  defined  by  DeCroot  (1970),  Press  (1972),  and 


Zellmer  (1971). 

What  is  the  predictive  distribution  of  tf^  If  one  uses 
Jeffreys'  prior  density  5^,  (2),  in  lieu  of  the  conjugate  prior 
density  C2»  (3)»?  Fortunately,  one  may  revise  the  previous 
theorem  and  thereby  produce  the  predictive  distribution. 


THEOREM  2 


If{Y(t):  t  »  0,^1,±2,...}  is  an  AR(p)  process  with  unknown 


parameters  6eB?  and  t  >  0,  SQ  a  sample  of  n  observations,  n  >  p, 
y(0),Y(-l),...,Y(l-p)  known  real  constants,  and  if  the  prior 
density  of  6  and  t  Is  £(1),(2),  the  predictive  distribution  of 
is  given  by  Theorem  1  by  letting  ¥  ■*  0(pxp),a  ■*  -p/2,  and 
@  -f  0  in  equations  (24)  through  (31). 

In  particular,  consider  a  first  order  model,  p  ■  1,  with 
Jeffreys'  prior  density  and  a  one-step  ahead  forecast,  k  »  1, 
then  what  is  the  predictive  density  of  W(l)7  According  to 
Theorem  2,  part  (a)  of  Theorem  1  gives  the  solution  with  o  +■  -h  , 


g  ■»  0,  and  1?  ->  0  substituted  into  equations  (24)  and  (25). 
This  gives  a  t  distribution  for  W(l)  with  n-1  degrees  of  freedom. 


location 


Y(n)  E  Y(t)Y(t-l) 
E{W(1)/S_]  - - &- 


(32) 


n-1 


5  y  <t) 


and  precision 


P[W(1)/S  ]  -  — 
n  n 


(n-1)  nZ1Y2(t) 
0 


[rY2(t)][SY2(t)]-[JY(t)Y(t-l)]2[SY2(t)][  Z  Y2(t)] 
0  1  0  0 

(33) 


n 

[El 

1 


n-1 

E 

0 


-1 


Using  the  vague  prior,  the  mean  of  the  predictive  distribution 
of  W(l)  is 

A  A 

Y(n+1)  -  0Y(n)  , 


where  the  posterior  mean  of  8 
a  n  n-1  9 

0  -  EY(t)Y(t-l)/  E  Y^(t)  , 
0  0 


(34) 


is  an  estimate  of  8,  the  autoregressive  paremeter,  which  when 
|8[  <1,  is  the  autocorrelation  between  successive  observations. 


A  straightforward  way  to  predict  Y(n+1)  Is  to  note 


Y(ttfl)  -  0Y(n)  +  e(nfl)  , 

thus  EY(nrKL)  ■  Y(n) ,  where  the  average  E  la  taken  with  respect 
to  e(iri»l)  given  Y(n) ,  then  EY(n+l)  Is  estimated  by  6*Y(n) ,  where 
0*  is  some  estimate  of  0,  say  the  mean  t  (34),  of  the  posterior 
distribution  of  0.  Hence  the  Bayesian  way  of  point  forecasting 
with  the  predictive  mean  conforms  to  the  usual  way  one  would 
attempt  to  solve  the  problem. 

If  one  would  want  to  forecast  W(l) ,  one  could  use  an  interval 
prediction  based  on  the  predictive  distribution  of  W(l),  which  Is 
t  with  n-1  degrees  of  freedom,  location  given  by  (32)  and  pre¬ 
cision  given  by  (33).  The  predictive  variance  of  W(l)  Is 

Var  [W(l)/&  ]  -  (n-1) (n-3)’1p"1[W(l) /S  ]  ,  (34) 

ti  U 

thus 

E[W(l)/Sn]  ♦  ta/2,n_iVvar[W(l)/sn]  (35) 

Is  a  1-Y,  0  <  y  <  prediction  Interval  of  Y(xri-l)  and  Is  easily 
computed  with  the  aid  of  student's  t  tables.  The  Intervals  have 
the  HPD  (highest  posterior  density)  property  explained  by  Box 
and  Tlao  (1965). 

Land  (1981)  gives  some  examples  of  one  and  two-step  ahead 
forecasting,  via  Theorem  2,  with  an  AR(1)  process  and  Jeffreys' 
prior  distribution. 

4.  A  Numerical  Study 

In  this  section  of  the  study,  an  AR(1)  model 

Y(t)  -  ♦Y(t-l)  +  e(t)  ,  t  -  1,2 . n  (36) 

Is  considersd,  where  Y(t)  is  the  observation  at  time  t,  Y(0)  is  a 


known  constant,  $£R  Is  tha  unknown  autoregressive  parameter 
and  the  e(t),  t  •  1,2,..., n  are  n.i.d.  (0,t),  where  T  >  0  is 
unknown.  Suppose  the  prior  density  for  $  and  t  is 

.  -  §(*-y)2  P  ,  e 

$<$,T)  «  T\  *  Ta  e  ,<freR,T  >  0  ,  (37) 

which  is  a  normal-gamma  density  with  parameters  yeR,  P  >  0,  o  end 
0  >  0o  The  marginal  prior  density  of  t  is  gamma  with  parameters 
a  >  0  and  0  >  0  and  the  marginal  prior  density  of  $  is 

«♦)  «  (20  +  (0-U)2P]“(1+2o)/2.^eR  (38) 

thus  apriori,  <J>  has  a  t  distribution  with  2a  degrees  of  freedom, 
location  y,  precision  (2a)  P(20)“3  end  variance  0(a-l)~*  p”1,  when 
a  >  1. 

Suppose  one  believes  that  the  process  is  'almost'  stationary,  . 
thee  one  would  want  'most'  of  the  marginal  prior  probability 
distribution  of  $  to  be  concentrated  over  (-1,+1) .  For  example, 
suppoee  one  wants  $e(-l,l)  with  a  preassigned  probability  of  1-Y, 
where  0  <  Y  <  1,  then,  assuming  a,  0,  and  y  are  fixed,  one  would 
choose  P  such  that 

*  "  et2Y/2,2a<1^’2(a-1)“1'lv«  *  °  >  (39> 

where  t  .  is  the  upper  100(y/2)X  point  of  the  t-distrlbution 
Y/2,2a 

with  2a  degrees  of  freedom.  Hence  a  and  0  ere  chosen  to  express 
the  prior  opinion  of  t,  y  is  chosen  as  one  initial  guess  of  the 
value  of  end  then  P  is  determined  from  (39). 

In  this  way,  one  may  express  one's  prior  opinion  of  an  almost 
(Y  close  to  1)  stationery  AR(1)  process  (36). 

How,  suppose  we  want  to  forecast  a  future  observation  Y(n  +  1) 


based  on  a  sample  SQ,  when  the  observations  were  generated  from  an 
A&(1)  process*  which  Is  almost  stationary.'  Clearly,  part  (A)  of 
theorem  1  applies  and  the  following  tables  were  computed  using 
formulas  (24)  and  (25)  of  that  theorem. 

Using  the  normal-gsmma  prior  (37)  for  $  and  t ,  samples  S  * 

n 

where  n(«  25*50,100*750)*  were  generated  with  <j>  -  0.0*. 50*. 75,  and 
.90*  T  ■  1,  and  Y(0)  ■  0.  The  parameters  of  the  prior  distribu¬ 
tion  were  a  •  10,  8  ■  a-  1,  y  -  0.0,0.25,0.5*0.75*  and  0.90*  and 
P  was  determined  from  (39)  with  y  -  .05,  that  is,  95Z  of  the 
marginal  prior  distribution  of  ♦  is  concentrated  over  (-1*1),  with 
a  prior  mean  of  u  and  a  prior  variance  of  P”1. 

Consider  Teble  2*  where  sample  S  ,  n  ■  25*50*100,750  was 

n 

generated  from  the  AR(1)  model  Y(t)  -  .25Y(t-l)  +  e(t),  with 
Y(0)  *  0  and  e(t)  -  n(0*l).  The  parameters  of  the  prior  distribu¬ 
tion  were  chosen  as  a  -  10,  8»  9,  V  «  0*  .25,.5Q  .75,  .9  *  which 
determined  *  from  (39)  with  Y  ■  .05  .  These  tables  consist  of 
three  parts,  namely  the  prior*  posterior*  and  predictive  informa¬ 
tion*  thus  the  first  row  corresponds  to  n  ■  25*  W  "0*  P-1  -  .2298 
1(t)  -  a/8  ■  1.1111  ,  var(t)  ■  a/ 8*  *  .1235,  the  marginal  posterior 
mean  of  $  is  .3004,  the  marginal  variance  of  $  is  .0299*  the  mean 
of  the  predictive  distribution  of  Y(a+1)  is  .4189*  which  is 
calculated  from  (24)  and  the  predictive  variance  of  Y(nfl)  is 
1.2954*  which  was  computed  from  (25).  Bach  table  corresponds  to 
a  particular  AK(1)  model  and  are  as  follows. 


What  can  be  concluded  from  this  numerical  study?  First,  one  sees 
the  Influence  of  the  ssmple  size  n  and  the  prior  mean  y  on  the 
posterior  mean  and  variance  and  the  predictive  mean  and  variance 
of  Y(nfl).  They  show  the  anticipated  reduction  in  the  posterior 
variance  of  $  and  the  predictive  variance  of  the  future  observation 
as  the  sample  size  Increases.  As  the  prior  mean  of  $  increases 
toward  one,  for  the  same  AR(1)  series  (same  value  of  <j>) ,  and  the 
same  value  of  n,  series  length,  the  posterior  variance  of  $ 
decreases,  because  as  U  Increases  to  one,  IP,  the  prior  variance 
of  $  decreases,  as  can  be  verified  from  (39).  Of  course,  this 
should  happen  since,  as  the  prior  variance  of  $  becomes  smaller, 
so  should  the  posterior  variance  of  $. 

Also  the  tables  show,  that  for  the  same  value  of  u  and  a, 

the  posterior  variance  of  <f>  decreases  as  the  true  value  of  <p 

increases  from  0  to  .90  .  Of  course,  this  is  anticipated  from  the 

theory  of  time  series,  because  the  usual  estimator  of  $  has  a 

-1  2 

large-sample  variance  of  n  (1  ),  see  Box  and  Jenkins  (1970). 

Results  for  the  predictive  density  show  that  its  mean  is 
highly  Influenced  by  the  value  of  the  last  observation  Y(n)  and 
its  variance  by  the  sample  size,  the  variance  decreasing  as  the 
sample  size  increases. 

Given  a  particular  AR(1)  model,  say  that  given  by  Table  4, 
one  would  expect  the  predictive  variance  to  be  the  smallest  when 
■  .75  and  n  ■  750  and  this  is  indeed  the  case.  The  same  holds 
for  the  other  four  tables.  For  example  with  Table  1,  (when  the 
'true'  value  of  $  ■  0),  the  predictive  variance  is  smallest  when 


y  -  0  and  n  ■  750,  which  is  the  largest  sample  size. 


Predictive  intervals  for  one-step  ahead  forecasts  are 
relatively  easy  to  find.  Suppose  the  AR(1)  model  with  4>  «  .5 
(Table  3)  is  used  to  generate  25  observations  and  one's  prior 
belief  is  based  on  y  »  .75,  a  *  10,  8-9,  and  the  1?  value  given 
by  (39),  i.e.,  one  is  assuming  the  process  is  stationary  with  a 
prior  probability  of  95%.  A  902  prediction  Interval  for  7(26)  is 
1.0821  t  1.2917  t  05>65,  which  is  (-0.817,2.9812),  and  this 
follows  from  part  a  of  Theorem  1. 

5.  Comments  and  Conclusions 

It  has  been  shown  that  if  an  AR(p)  is  an  adequate  model  for 
a  time  series  and  if  prior  information  is  expressed  as  either  a 
conjugate  prior  density  or  a  Jeffreys'  vague  improper  density,  the 
predictive  distribution  of  k  future  observations  is  characterized 
by  the  product  of  k  univariate  t  densities.  Theorem  1  and  2  give 
the  particular  details  of  the  predictive  distribution. 

These  theorems  provide  one  with  a  way  to  make  point  and 
interval  forecasts  of  future  observations.  For  example,  the  mean 
of  the  predictive  distribution  gives  a  point  forecast,  and  the 
mean  together  with  the  variance  of  the  predictive  distribution 
allows  one  to  construct  Interval  forecasts.  The  theorems  also 
give  a  way  to  make  one-step,  two-step,  and  other  poly-step 
predictions. 

The  numerical  study  illustrates  the  sensitivity  of  the 


The  calculations  reveal  that  the  variance  of  the  one-step  ahead 
forecasts  Is  a  minimum  for  the  largest  sample  size  and  when  the 
prior  mean  of  the  autoregressive  parameter  coincides  with  the 
'true'  value  of  that  parameter.  The  prior  distribution  of  the 
parameters  of  the  AR(1)  models  were  chosen  to  express  near 
statlonarlty  of  the  process. 

It  would  be  interesting  to  develop  the  predictive  distribu¬ 
tion  of  vector  autoregressive  processes.  One  would  expect  the 
forecasting  distribution  to  be  characterized  by  the  product  of 
multivariate  t  densities. 
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